Dari AI Spesifik Tugas ke Model Bahasa Besar Umum

Perubahan Paradigma dalam Kecerdasan Buatan

1. Dari Spesifik ke Umum

Bidang AI telah mengalami transformasi besar dalam cara model dilatih dan diterapkan.

Paradigma Lama (Pelatihan Spesifik Tugas):Model seperti CNN awal atau BERT dilatih untuk satu tujuan khusus (misalnya, Analisis Sentimen saja). Anda membutuhkan model yang berbeda untuk terjemahan, ringkasan, dll.
Paradigma Baru (Pelatihan Pra-Terpusat + Prompt):Satu model besar (LLM) mempelajari pengetahuan umum dunia dari dataset berskala internet. Model ini kemudian dapat diarahkan untuk melakukan hampir semua tugas bahasa hanya dengan mengubah prompt input.

2. Evolusi Arsitektur

Hanya Encoder (Era BERT):Berfokus pada pemahaman dan klasifikasi. Model-model ini membaca teks secara dua arah untuk memahami konteks mendalam tetapi tidak dirancang untuk menghasilkan teks baru.
Hanya Decoder (Era GPT/Llama):Standar modern untuk AI generatif. Model-model ini menggunakan pemodelan auto-regresif untuk memprediksi kata berikutnya, menjadikannya sangat ideal untuk generasi tanpa batas dan percakapan.

3. Pendorong Utama Perubahan

Pembelajaran Mandiri:Pelatihan pada jumlah besar data internet tanpa label, menghilangkan hambatan anotasi manusia.
Hukum Skala:Observasi empiris bahwa kinerja AI meningkat secara prediktif seiring dengan ukuran model (parameter), volume data, dan daya komputasi.

Wawasan Kunci

AI telah beralih dari "alat spesifik tugas" menjadi "agen umum" yang menunjukkan kemampuan muncul seperti berpikir logis dan pembelajaran dalam konteks.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.